AITopics | high-resolution image

Collaborating Authors

high-resolution image

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Neural Information Processing SystemsJun-19-2026, 11:02:27 GMT

Recent advancements in vision-language models (VLMs) have improved performance by increasing the number of visual tokens, which are often significantly longer than text tokens. However, we observe that most real-world scenarios do not require such an extensive number of visual tokens. While the performance drops significantly in a small subset of OCR-related tasks, models still perform accurately in most other general VQA tasks with only 1/4 resolution. Therefore, we propose to dynamically process distinct samples with different resolutions, and present a new paradigm for visual token reduction, namely, VisionThink. It starts with a downsampled image and smartly decides whether it is sufficient for problem solving.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(4 more...)

Add feedback

FINERS: Fine-grained Reasoning and Segmentation of Small Objects with Reinforcement Learning

Neural Information Processing SystemsJun-17-2026, 18:22:29 GMT

Multi-modal Large Language Models (MLLMs) have shown remarkable capabilities across a wide range of vision-language tasks. However, due to the restricted input resolutions, MLLMs face significant challenges in precisely understanding and localizing visual details in high-resolution images--particularly when dealing with extra-small objects embedded in cluttered contexts. To address this issue, we propose FINERS, a two-stage MLLM-based reinforcement learning framework for jointly reasoning and segmenting extremely small objects within high-resolution scenes. FINERS adopts a coarse-to-fine pipeline comprising Global Semantic Exploration (GSE) and Localized Perceptual Refinement (LPR). Specifically, GSE performs instruction-guided reasoning to generate a textural response and a coarse target region, while LPR refines this region to produce an accurate bounding box and segmentation mask. To couple the two stages, we introduce a locate-informed retrospective reward, where LPR's outputs are used to optimize GSE for more robust coarse region exploration.

large language model, machine learning, segmentation, (19 more...)

Neural Information Processing Systems

Country:

Asia > China (0.46)
Europe (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing

Neural Information Processing SystemsJun-9-2026, 20:57:21 GMT

Recent advances in self-supervised learning for Vision Transformers (ViTs) have fueled breakthroughs in remote sensing (RS) foundation models. However, the quadratic complexity of self-attention poses a significant barrier to scalability, particularly for large models and high-resolution images. While the linear-complexity Mamba architecture offers a promising alternative, existing RS applications of Mamba remain limited to supervised tasks on small, domain-specific datasets. To address these challenges, we propose RoMA, a framework that enables scalable self-supervised pretraining of Mamba-based RS foundation models using large-scale, diverse, unlabeled data. RoMA enhances scalability for high-resolution images through a tailored auto-regressive learning strategy, incorporating two key innovations: 1) a rotation-aware pretraining mechanism combining adaptive cropping with angular embeddings to handle sparsely distributed objects with arbitrary orientations, and 2) multi-scale token prediction objectives that address the extreme variations in object scales inherent to RS imagery. Systematic empirical studies validate that Mamba adheres to RS data and parameter scaling laws, with performance scaling reliably as model and data size increase. Furthermore, experiments across scene classification, object detection, and semantic segmentation tasks demonstrate that RoMA-pretrained Mamba models consistently outperform ViT-based counterparts in both accuracy and computational efficiency. The source code and pretrained models have be released at https://github.com/MiliLab/RoMA.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

KOALA: Empirical Lessons Toward Memory-Efficient and Fast Diffusion Models for Text-to-Image Synthesis

Neural Information Processing SystemsMar-20-2026, 19:45:13 GMT

As text-to-image (T2I) synthesis models increase in size, they demand higher inference costs due to the need for more expensive GPUs with larger memory, which makes it challenging to reproduce these models in addition to the restricted access to training datasets. Our study aims to reduce these inference costs and explores how far the generative capabilities of T2I models can be extended using only publicly available datasets and open-source models. To this end, by using the de facto standard text-to-image model, Stable Diffusion XL (SDXL), we present three key practices in building an efficient T2I model: (1) Knowledge distillation: we explore how to effectively distill the generation capability of SDXL into an efficient U-Net and find that self-attention is the most crucial part.

artificial intelligence, machine learning, proceedings, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Artificial Intelligence > Vision (0.57)

Add feedback

eeb3df2d70affd52f65ff3b9abb32487-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-18-2026, 15:28:46 GMT

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > China > Shaanxi Province > Xi'an (0.06)
Europe > Germany > Brandenburg > Potsdam (0.04)
(2 more...)

Genre: Research Report (0.46)

Industry:

Law (0.70)
Government (0.67)
Energy > Renewable (0.31)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.69)
(2 more...)

Add feedback

d94e778e4cfabddec0cb6d38ab57accd-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 08:25:28 GMT

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > California (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

A Unified Conditional Framework for Diffusion-based Image Restoration

Neural Information Processing SystemsFeb-16-2026, 02:52:38 GMT

Image restoration tasks are ill-posed problems whose solutions are not unique.

artificial intelligence, diffusion model, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Optimal Transport-Guided Conditional Score-Based Diffusion Model Xiang Gu1, Liwei Y ang

Neural Information Processing SystemsFeb-14-2026, 09:23:11 GMT

Conditional score-based diffusion model (SBDM) is for conditional generation of target data with paired data as condition, and has achieved great success in image translation. However, it requires the paired data as condition, and there would be insufficient paired data provided in real-world applications.

artificial intelligence, machine learning, target data, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis

Huaibo Huang, zhihang li, Ran He, Zhenan Sun, Tieniu Tan

Neural Information Processing SystemsFeb-12-2026, 04:47:49 GMT

We present a novel introspective variational autoencoder (IntroVAE) model for synthesizing high-resolution photographic images. IntroVAE is capable of selfevaluating the quality of its generated samples and improving itself accordingly.

artificial intelligence, inneurip, machine learning, (18 more...)

Neural Information Processing Systems

Country: